Disambiguating Verbs by Collocation: Corpus Lexicography meets Natural Language Processing

نویسندگان

  • Ismaïl El Maarouf
  • Jane Bradbury
  • Vít Baisa
  • Patrick Hanks
چکیده

This paper reports the results of Natural Language Processing (NLP) experiments in semantic parsing, based on a new semantic resource, the Pattern Dictionary of English Verbs (PDEV) (Hanks, 2013). This work is set in the DVC (Disambiguating Verbs by Collocation) project aimed at expanding PDEV to a large scale. This project springs from a long-term collaboration of lexicographers with computer scientists which has given rise to the design and maintenance of specific, adapted, and user-friendly editing and exploration tools. Particular attention is drawn on the use of NLP deep semantic methods to help in data processing. Possible contributions for NLP include pattern disambiguation, the focus of this article. The present article explains how PDEV differs from other lexical resources and describes its structure in detail. It also presents new classification experiments on a subset of 25 verbs. The SVM model obtained a micro-average F1 score of 0.81.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Synchronous Corpus-Based Study of Verb-Noun Fluidity in Chinese

The problem of verb-noun categorial ambiguity is critical and relatively unique for non-inflectional languages, especially Chinese. We consider the verb-noun categorial fluidity a continuum and any categorial shift a transitional process. A synchronous corpus-based study was conducted to compare the phenomenon with respect to news texts collected from Hong Kong, Beijing, and Taiwan. It was foun...

متن کامل

The Application of Fuzzy Logic to Collocation Extraction

Collocations are important for many tasks of Natural language processing such as information retrieval, machine translation, computational lexicography etc. So far many statistical methods have been used for collocation extraction. Almost all the methods form a classical crisp set of collocation. We propose a fuzzy logic approach of collocation extraction to form a fuzzy set of collocations in ...

متن کامل

International Workshop Natural Language Processing Methods and Corpora in Translation, Lexicography, and Language Learning

TerminoWeb is a web-based platform designed to find and explore specialized domain knowledge on the Web. An important aspect of this exploration is the discovery of domain-specific collocations on the Web and their presentation in a concordancer to provide contextual information. Such information is valuable to a translator or a language learner presented with a source text containing a specifi...

متن کامل

SemEval-2015 Task 15: A CPA dictionary-entry-building task

This paper describes the first SemEval task to explore the use of Natural Language Processing systems for building dictionary entries, in the framework of Corpus Pattern Analysis. CPA is a corpus-driven technique which provides tools and resources to identify and represent unambiguously the main semantic patterns in which words are used. Task 15 draws on the Pattern Dictionary of English Verbs ...

متن کامل

On Delexicalization Features of Light Verbs in Mandarin

Delexicalization is the tendency of meaning, reflecting the structural feature of light verbs. It emerges from collocation usages of semantic sharing. Based on the corpus and exemplified with “ ” and “ ”, the current paper addresses the delexicalizing features of Chinese light verbs, while proposing a lexicographic approach concerning the conventionality of light verb construction and endeavori...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014